{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Regional analysis\n",
    "\n",
    "Maybe you're just interested in a particular region of the Arctic. Here, we want to filter our dataset to only include data from that region. We can do this using the NSIDC region mask for the Arctic, which is included in the xarray dataset available in the jupyter bucket. The function used in this notebook relies on the [NSIDC region mask for the Arctic](https://nsidc.org/data/polar-stereo/tools_masks.html#region_masks) included in the xarray dataset from the jupyter book's google bucket. This function is referenced in other notebooks in this book. \n",
    " \n",
    "**Input**:\n",
    " - xarray dataset from the jupyter book's google bucket\n",
    " \n",
    " \n",
    " **Output**: \n",
    "  - plots of sea ice thickness and sea ice type in the Arctic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{tip}\n",
    "Try running this notebook in Google Colab! Toggle over the rocketship icon at the top of the page and click Colab to open a new window and run the notebook. <br><br>To run a single cell, type **Shift+Enter**. To run the whole notebook, under **Runtime** click **Run all**. Note that you will have to run the notebook from the very beginning and load all the Google Colab dependencies for the code to work.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "#this cell will load dependencies for running the notebook in Google Colab\n",
    "#this cell may take a while to run\n",
    "import sys\n",
    "\n",
    "#if code is running in google colab, run these cells to install neccessary libraries\n",
    "if 'google.colab' in sys.modules: \n",
    "    !pip install netcdf4\n",
    "    !pip install xarray==0.16.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import notebook dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "remove-output"
    ]
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import numpy as np\n",
    "import xarray as xr\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load data into notebook\n",
    "Copy file from the book's google bucket and load into an xarray dataset. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "remove-output"
    ]
   },
   "outputs": [],
   "source": [
    "!gsutil -m cp gs://is2-pso-seaice/icesat2-book-data.nc ./\n",
    "dataset = xr.open_dataset('icesat2-book-data.nc')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Display region labels and keys \n",
    " - Each region defined by the NSIDC region mask corresponds to a integer key that can be used to reference it.\n",
    " - The keys and labels are stored as attributes to the <span style=\"color:darkmagenta; font-family: Courier\">region_mask</span> coordinate value in the NETCDF4 file.\n",
    " - The code below creates a pandas DataFrame object to show all of the keys and their corresponding labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#get region mask attributes as a dictionary object \n",
    "regionMask = dataset.region_mask.attrs\n",
    "\n",
    "#get region keys and labels\n",
    "keys = regionMask['keys']\n",
    "labels = regionMask['labels']\n",
    "\n",
    "#create and display table of keys and labels \n",
    "tbl = pd.DataFrame({'key':keys, 'label':labels})\n",
    "display(tbl.style.hide_index())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define region/s of interest by key\n",
    " - Define variable <span style=\"color:darkmagenta; font-family: Courier\">regionKeyList</span>, a list of integers corresponding to keys you want to restrict the data to. \n",
    "     - Make sure <span style=\"color:darkmagenta; font-family: Courier\">regionKeyList</span> is a list, even if you just want one region (i.e. <span style=\"color:darkmagenta; font-family: Courier\">regionKeyList = [15]</span> for the Arctic Ocean) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "regionKeyList = [10,11,12,13,15] #Inner Arctic\n",
    "#regionKeyList = [13] #Beaufort Sea\n",
    "#regionKeyList = [10] #Laptev Sea\n",
    "#regionKeyList = list(tbl['key']) #if you want to keep all regions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Restrict the data to desired region/s\n",
    "This function can also be accessed from the utils.py file, stored in the google bucket for this book. \n",
    " - Function will return a new dataset restricted to the input keys by applying a condition that the region mask needs to be equal to the key/s. \n",
    " - The code will raise a ValueError if <span style=\"color:darkmagenta; font-family: Courier\">regionKeyList</span> is not a list \n",
    " - The code will raise a ValueError if <span style=\"color:darkmagenta; font-family: Courier\">regionKeyList</span> contains invalid keys \n",
    "     - Invalid keys are integer values that do not correspond to a region as defined by the NSIDC region mask for the arctic\n",
    " - The code will raise a warning if you remove all regions from the data \n",
    "     - This occurs when <span style=\"color:darkmagenta; font-family: Courier\">regionKeyList</span> equals an empty list (<span style=\"color:darkmagenta; font-family: Courier\">regionKeyList = []</span>) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def restrictRegionally(dataset, regionKeyList): \n",
    "    \"\"\"Restrict dataset to input regions.\n",
    "    \n",
    "    Args: \n",
    "        dataset (xr Dataset): dataset generated by Load_IS2 notebook\n",
    "        regionKeyList (list): list of region keys to restrict data to \n",
    "        \n",
    "    Returns: \n",
    "        regionalDataset (xr Dataset): dataset with restricted data to input regions\n",
    "    \"\"\"\n",
    "    \n",
    "    def checkKeys(regionKeyList, regionTbl): \n",
    "        \"\"\"Check that regionKeyList was defined correctly\n",
    "\n",
    "        Raises: \n",
    "            ValueError if regionKeyList was not defined correctly \n",
    "            warning if all data was removed from the dataset\n",
    "        \"\"\"\n",
    "        if type(regionKeyList) != list: #raise a ValueError if regionKeyList is not a list \n",
    "            raise ValueError('regionKeyList needs to be a list. \\nFor example, if you want to restrict data to the Beaufort Sea, define regionKeyList = [13]')\n",
    "\n",
    "        for key in regionKeyList: \n",
    "            if key not in list(regionTbl['key']): \n",
    "                raise ValueError('Region key ' + str(key) + ' does not exist in region mask. \\n Redefine regionKeyList with key numbers from table')\n",
    "\n",
    "        if len(regionKeyList) == 0: \n",
    "            warnings.warn('You removed all the data from the dataset. Are you sure you wanted to do this? \\n If not, make sure the list regionKeyList is not empty and try again. \\n If you intended to keep data from all regions, set regionKeyList = list(tbl[\\\"key\\\"])')\n",
    " \n",
    "    #create a table of keys and labels\n",
    "    regionMask = dataset.region_mask.attrs\n",
    "    regionTbl = pd.DataFrame({'key': regionMask['keys'], 'label': regionMask['labels']})\n",
    "    \n",
    "    #call function to check if regionKeyList was defined correctly\n",
    "    checkKeys(regionKeyList, regionTbl)\n",
    "    \n",
    "    #keys to remove (all keys that are note listed in regionKeyList)\n",
    "    keysToRemove = [key for key in list(regionTbl['key']) if key not in regionKeyList]\n",
    "    \n",
    "    #filter elements from the ice thickness DataArray where the region is the desired region\n",
    "    regionalDataset = dataset.copy()\n",
    "    for var in dataset.data_vars: \n",
    "        if var != 'seaice_conc_monthly_cdr':\n",
    "            regionalVar = regionalDataset[var]\n",
    "            for key in keysToRemove: \n",
    "                regionalVar = regionalVar.where(regionalVar['region_mask'] != key)\n",
    "            regionalDataset[var] = regionalVar\n",
    "    \n",
    "    #find name of labels \n",
    "    labels = [regionTbl[regionTbl['key'] == key]['label'].item() for key in regionKeyList]\n",
    "    \n",
    "    #add new attributes describing changes made to the dataset\n",
    "    if len(labels) < len(regionTbl['key']): \n",
    "        if set(regionKeyList) == set([10,11,12,13,15]): #convert to sets so unordered lists are compared\n",
    "            regionalDataset.attrs['regions with data'] = 'Inner Arctic'\n",
    "        else:    \n",
    "            regionalDataset.attrs['regions with data'] = ('%s' % ', '.join(map(str, labels)))\n",
    "        print('Regions selected: ' + regionalDataset.attrs['regions with data'])\n",
    "    else: \n",
    "        regionalDataset.attrs['regions with data'] = 'All'\n",
    "        print('Regions selected: All \\nNo regions will be removed')\n",
    "    \n",
    "    return regionalDataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Call function to restrict the data regionally"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#define a list of keys corresponding to the region of interest\n",
    "regionKeyList = [10,11,12,13,15] #Inner Arctic\n",
    "\n",
    "#restrict data to that region\n",
    "dataset = restrictRegionally(dataset, regionKeyList)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
